{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "07f4c93e-e2c4-4e0e-a4c6-afc5391dad40",
   "metadata": {},
   "source": [
    "# Using Label Studio for Annotations with Pixeltable\n",
    "\n",
    "This tutorial demonstrates how to integrate Pixeltable with Label Studio, in order to provide seamless management of annotations data across the annotation workflow. We'll assume that you're at least somewhat familiar with Pixeltable and have read the [Pixeltable Basics](https://pixeltable.readme.io/docs/pixeltable-basics) tutorial.\n",
    "\n",
    "__This tutorial can only be run in a local Pixeltable installation, not in Colab or Kaggle__, since it relies on spinning up a locally running Label Studio instance. See the [Installation Guide](https://pixeltable.readme.io/docs/installation) for instructions on how to set up a local Pixeltable instance.\n",
    "\n",
    "To begin, let's ensure the requisite dependencies are installed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba177f97-6bbc-4c2a-8c81-a6eecb8a8e43",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install -q pixeltable label-studio label-studio-sdk torch transformers"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80ba2580-8f6b-4292-8280-acbdf9b6e4f2",
   "metadata": {},
   "source": [
    "## Set up Label Studio\n",
    "\n",
    "Now let's spin up a Label Studio server process. (If you're already running Label Studio, you can choose to skip this step, and instead enter your existing Label Studio URL and access token in the subsequent step.) Be patient, as it may take a minute or two to start.\n",
    "\n",
    "This will open a new browser window containing the Label Studio interface. If you've never run Label Studio before, you'll need to create an account; a link to create one will appear in the Label Studio browser window. __Everything is running locally in this tutorial, so the account will exist only on your local system.__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "91c032cb-472d-4e66-9594-8edebf527b66",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Performing system checks...\n",
      "\n",
      "System check identified no issues (1 silenced).\n",
      "August 01, 2024 - 21:54:20\n",
      "Django version 3.2.25, using settings 'label_studio.core.settings.label_studio'\n",
      "Starting development server at http://0.0.0.0:8080/\n",
      "Quit the server with CONTROL-C.\n"
     ]
    }
   ],
   "source": [
    "import subprocess\n",
    "ls_process = subprocess.Popen(['label-studio'], stderr=subprocess.PIPE)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcfa1986-c742-4097-b200-6c195f22cdd2",
   "metadata": {},
   "source": [
    "If for some reason the Label Studio browser window failed to open, you can always access it at: http://localhost:8080/\n",
    "\n",
    "Once you've created an account in Label Studio, you'll need to locate your API key. In the Label Studio browser window, log in, and click on \"Account & Settings\" in the top right. Copy the Access Token from the interface."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f780b92-b698-4c5a-b5a1-3ff4b4d6a45b",
   "metadata": {},
   "source": [
    "## Configure Pixeltable\n",
    "\n",
    "Next, we configure Pixeltable to communicate with Label Studio. Run the following command, pasting in the API key that you copied from the Label Studio interface."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "d8d5660c-7986-4558-bbb7-8eca1ae7590f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdin",
     "output_type": "stream",
     "text": [
      "Label Studio API key:  ········\n"
     ]
    }
   ],
   "source": [
    "import getpass\n",
    "import os\n",
    "\n",
    "if 'LABEL_STUDIO_URL' not in os.environ:\n",
    "    os.environ['LABEL_STUDIO_URL'] = 'http://localhost:8080/'\n",
    "\n",
    "if 'LABEL_STUDIO_API_KEY' not in os.environ:\n",
    "    os.environ['LABEL_STUDIO_API_KEY'] = getpass.getpass('Label Studio API key: ')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5694554a-8a81-4dd8-b351-a89260ef3bf9",
   "metadata": {},
   "source": [
    "## Create a Table to Store Videos\n",
    "\n",
    "Now we create the master table that will hold our videos to be annotated. This only needs to be done once, when we initially set up the workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "876c5342-d166-4cf1-92a8-8130308697e3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/asiegel/.pixeltable/pgdata\n",
      "Created directory `ls_demo`.\n",
      "Created table `videos`.\n"
     ]
    }
   ],
   "source": [
    "import pixeltable as pxt\n",
    "\n",
    "schema = {\n",
    "    'video': pxt.VideoType(),\n",
    "    'date': pxt.TimestampType()\n",
    "}\n",
    "\n",
    "# Before creating the table, we drop the `ls_demo` dir and all its contents,\n",
    "# in order to ensure a clean environment for the demo.\n",
    "pxt.drop_dir('ls_demo', force=True)\n",
    "pxt.create_dir('ls_demo')\n",
    "videos_table = pxt.create_table('ls_demo.videos', schema)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d25fe62-b59c-4602-a0ce-46170f378150",
   "metadata": {},
   "source": [
    "## Populate It with Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e8357ad-345d-4273-a3ec-8711f0b48245",
   "metadata": {},
   "source": [
    "Now let's add some videos to the table to populate it. For this tutorial, we'll use some randomly selected videos from the Multimedia Commons archive. The table also contains a `date` field, for which we'll use a fixed date (but in a production setting, it would typically be the date on which the video was imported)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "29bea73f-c455-4e8f-aed7-ee34faa421c3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `videos`: 3 rows [00:00, 881.28 rows/s]\n",
      "Inserted 3 rows with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "UpdateStatus(num_rows=3, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from datetime import date\n",
    "\n",
    "url_prefix = 'http://multimedia-commons.s3-website-us-west-2.amazonaws.com/data/videos/mp4/'\n",
    "files = [\n",
    "    '122/8ff/1228ff94bf742242ee7c88e4769ad5d5.mp4',\n",
    "    '2cf/a20/2cfa205eae979b31b1144abd9fa4e521.mp4',\n",
    "    'ffe/ff3/ffeff3c6bf57504e7a6cecaff6aefbc9.mp4',\n",
    "]\n",
    "today = date(2024, 4, 22)\n",
    "videos_table.insert({'video': url_prefix + file, 'date': today} for file in files)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4727dd4b-5b21-4ff9-a6a8-4c0e53641841",
   "metadata": {},
   "source": [
    "Let's have a look at the table now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c984cf21-0e24-4844-bafd-9772ceaeffc7",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>video</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e5230e98c24095daeaf27d2610f31f523dcd010968b7907d9a6cffb8fb59a5ec.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>2024-04-22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_4dd99a025363e526d7f2f8d57fd46a4c2a100e116b6400aa3032a4e2b3c8fb08.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>2024-04-22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e9a8b8990c2543668e647eb595c43efa5634215cc43508b13fe18f80465b0f46.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>2024-04-22</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               video       date\n",
       "0  /Users/asiegel/.pixeltable/file_cache/a6530bfb... 2024-04-22\n",
       "1  /Users/asiegel/.pixeltable/file_cache/a6530bfb... 2024-04-22\n",
       "2  /Users/asiegel/.pixeltable/file_cache/a6530bfb... 2024-04-22"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "videos_table.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f114e857-e749-4dc9-977d-8bb676b3c660",
   "metadata": {},
   "source": [
    "## Create a Label Studio project\n",
    "\n",
    "Next we'll create a new Label Studio project and link it to a new view on the Pixeltable table. You can link a Label Studio project to either a table or a view. For tables that are expecting a lot of input data, it's often easier to link to views. In this example, we'll create a view that filters the table down by date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "65857a69-582a-43ca-b370-caa2799f3df1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `videos_2024_04_22`: 3 rows [00:00, 2022.98 rows/s]\n",
      "Created view `videos_2024_04_22` with 3 rows, 0 exceptions.\n",
      "Added 3 column values with 0 errors.\n",
      "Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 909.63 cells/s]\n",
      "Linked external store `ls_project_0` to table `videos_2024_04_22`.\n",
      "Created 3 new task(s) in LabelStudioProject `videos_2024_04_22`.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "SyncStatus(external_rows_created=3, external_rows_deleted=0, external_rows_updated=0, pxt_rows_updated=0, num_excs=0)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a view to filter on the specified date\n",
    "\n",
    "v = pxt.create_view(\n",
    "    'ls_demo.videos_2024_04_22',\n",
    "    videos_table.where(videos_table.date == today)\n",
    ")\n",
    "\n",
    "# Create a new Label Studio project and link it to the view. The\n",
    "# configuration uses Label Studio's standard XML format. This only\n",
    "# needs to be done once: after the view and project are linked,\n",
    "# the relationship is stored indefinitely in Pixeltable's metadata.\n",
    "\n",
    "label_config = '''\n",
    "    <View>\n",
    "      <Video name=\"video\" value=\"$video\"/>\n",
    "      <Choices name=\"video-category\" toName=\"video\" showInLine=\"true\">\n",
    "        <Choice value=\"city\"/>\n",
    "        <Choice value=\"food\"/>\n",
    "        <Choice value=\"sports\"/>\n",
    "      </Choices>\n",
    "    </View>\n",
    "    '''\n",
    "\n",
    "pxt.io.create_label_studio_project(v, label_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a64d0394-ab86-427c-8b7e-5a71f9ece6bc",
   "metadata": {},
   "source": [
    "If you look in the Label Studio UI now, you'll see that there's a new project with the name `videos_2022_04_22`, with three tasks, one for each of the videos in the view. If you want to create the project without populating it with tasks (yet), you can set `sync_immediately=False` in the call to `create_label_studio_project()`. You can always sync the table and project by calling `v.sync()`.\n",
    "\n",
    "Note also that we didn't have to specify an explicit mapping between Pixeltable columns and Label Studio data fields. This is because, by default, Pixeltable assumes the Pixeltable and Label Studio field names coincide. The data field in the Label Studio project has the name `$video`, which Pixeltable maps, by default, to the column in `ls_demo.videos_2024_02_22` that is also called `video`. If you want to override this behavior to specify an explicit mapping of columns to fields, you can do that with the `col_mapping` parameter of `create_label_studio_project()`.\n",
    "\n",
    "Inspecting the view, we also see that Pixeltable created an additional column on the view, `annotations`, which will hold the output of our annotations workflow. The name of the output column can also be overridden by specifying a dict entry in `col_mapping` of the form `{'my_col_name': 'annotations'}`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ccd06368-f770-4203-903d-1ac6dd4c61de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style type=\"text/css\">\n",
       "#T_a4c44 th {\n",
       "  text-align: center;\n",
       "}\n",
       "#T_a4c44_row0_col0, #T_a4c44_row0_col1, #T_a4c44_row0_col2, #T_a4c44_row1_col0, #T_a4c44_row1_col1, #T_a4c44_row1_col2, #T_a4c44_row2_col0, #T_a4c44_row2_col1, #T_a4c44_row2_col2 {\n",
       "  white-space: pre-wrap;\n",
       "  text-align: left;\n",
       "}\n",
       "</style>\n",
       "<table id=\"T_a4c44\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th id=\"T_a4c44_level0_col0\" class=\"col_heading level0 col0\" >Column Name</th>\n",
       "      <th id=\"T_a4c44_level0_col1\" class=\"col_heading level0 col1\" >Type</th>\n",
       "      <th id=\"T_a4c44_level0_col2\" class=\"col_heading level0 col2\" >Computed With</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td id=\"T_a4c44_row0_col0\" class=\"data row0 col0\" >annotations</td>\n",
       "      <td id=\"T_a4c44_row0_col1\" class=\"data row0 col1\" >json</td>\n",
       "      <td id=\"T_a4c44_row0_col2\" class=\"data row0 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_a4c44_row1_col0\" class=\"data row1 col0\" >video</td>\n",
       "      <td id=\"T_a4c44_row1_col1\" class=\"data row1 col1\" >video</td>\n",
       "      <td id=\"T_a4c44_row1_col2\" class=\"data row1 col2\" ></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td id=\"T_a4c44_row2_col0\" class=\"data row2 col0\" >date</td>\n",
       "      <td id=\"T_a4c44_row2_col1\" class=\"data row2 col1\" >timestamp</td>\n",
       "      <td id=\"T_a4c44_row2_col2\" class=\"data row2 col2\" ></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n"
      ],
      "text/plain": [
       "view 'videos_2024_04_22'\n",
       "\n",
       "Column Name      Type Computed With\n",
       "annotations      json              \n",
       "      video     video              \n",
       "       date timestamp              "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff94777f-dd99-42ee-afe2-1f5684d99929",
   "metadata": {},
   "source": [
    "## Add Some Annotations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c68ab5c2-eb10-4eec-abb6-72fc9b8371d2",
   "metadata": {},
   "source": [
    "Now, let's add some annotations to our Label Studio project to simulate a human-in-the-loop workflow. In the Label Studio UI, click on the new `videos_2024_02_22` project, and click on any of the three tasks. Select the appropriate category (\"city\", \"food\", or \"sports\"), and click \"Submit\"."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0956c5e8-d48a-4e4a-b594-6c07e4664083",
   "metadata": {},
   "source": [
    "## Import the Annotations Back To Pixeltable"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0f2a373-f1b2-4bdc-941a-41ca85dec038",
   "metadata": {},
   "source": [
    "Now let's try importing annotations from Label Studio back to our view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8aa21d6a-c3f8-4a80-921d-0e7df22e3a40",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created 0 new task(s) in LabelStudioProject `videos_2024_04_22`.\n",
      "Updated annotation(s) from 1 task(s) in LabelStudioProject `videos_2024_04_22`.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "SyncStatus(external_rows_created=0, external_rows_deleted=0, external_rows_updated=0, pxt_rows_updated=1, num_excs=0)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v = pxt.get_table('ls_demo.videos_2024_04_22')\n",
    "v.sync()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "013633e5-1802-47b9-b84f-81c3b3e4ac81",
   "metadata": {},
   "source": [
    "Let's see what effect that had. You'll see that any videos that you annotated now have their `annotations` field populated in the view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "006268eb-6f96-40f8-b1e2-f96eb7aeee9c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>video</th>\n",
       "      <th>annotations</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e5230e98c24095daeaf27d2610f31f523dcd010968b7907d9a6cffb8fb59a5ec.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>[{&quot;id&quot;: 33, &quot;task&quot;: 99, &quot;result&quot;: [{&quot;id&quot;: &quot;nNQBOQuMTJ&quot;, &quot;type&quot;: &quot;choices&quot;, &quot;value&quot;: {&quot;choices&quot;: [&quot;sports&quot;]}, &quot;origin&quot;: &quot;manual&quot;, &quot;to_name&quot;: &quot;video&quot;, &quot;from_name&quot;: &quot;video-category&quot;}], &quot;project&quot;: 102, &quot;import_id&quot;: null, &quot;lead_time&quot;: 2.962, &quot;created_at&quot;: &quot;2024-08-01T21:54:48.880563Z&quot;, &quot;updated_at&quot;: &quot;2024-08-01T21:54:48.880602Z&quot;, &quot;updated_by&quot;: 2, &quot;created_ago&quot;: &quot;0\\u00a0minutes&quot;, &quot;last_action&quot;: null, &quot;completed_by&quot;: 2, &quot;ground_truth&quot;: false, &quot;was_cancelled&quot;: false, &quot;last_created_by&quot;: null, &quot;created_username&quot;: &quot; asiegel@pixeltable.com, 2&quot;, &quot;draft_created_at&quot;: null, &quot;parent_annotation&quot;: null, &quot;parent_prediction&quot;: null}]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_4dd99a025363e526d7f2f8d57fd46a4c2a100e116b6400aa3032a4e2b3c8fb08.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e9a8b8990c2543668e647eb595c43efa5634215cc43508b13fe18f80465b0f46.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               video  \\\n",
       "0  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "1  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "2  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "\n",
       "                                         annotations  \n",
       "0  [{'id': 33, 'task': 99, 'result': [{'id': 'nNQ...  \n",
       "1                                               None  \n",
       "2                                               None  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v.select(v.video, v.annotations).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "955e0485-f026-4a2c-97fc-07f61d979da7",
   "metadata": {},
   "source": [
    "## Parse Annotations with a Computed Column\n",
    "\n",
    "Pixeltable pulls in all sorts of metadata from Label Studio during a sync: everything that Label Studio reports back about the annotations, including things like the user account that created the annotations. Let's say that all we care about is the annotation value. We can add a computed column to our table to pull it out."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "432a983e-f3eb-44bd-a4c9-fab556c3a754",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 381.01 cells/s]\n",
      "Added 3 column values with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>video</th>\n",
       "      <th>annotations</th>\n",
       "      <th>video_category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e5230e98c24095daeaf27d2610f31f523dcd010968b7907d9a6cffb8fb59a5ec.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>[{&quot;id&quot;: 33, &quot;task&quot;: 99, &quot;result&quot;: [{&quot;id&quot;: &quot;nNQBOQuMTJ&quot;, &quot;type&quot;: &quot;choices&quot;, &quot;value&quot;: {&quot;choices&quot;: [&quot;sports&quot;]}, &quot;origin&quot;: &quot;manual&quot;, &quot;to_name&quot;: &quot;video&quot;, &quot;from_name&quot;: &quot;video-category&quot;}], &quot;project&quot;: 102, &quot;import_id&quot;: null, &quot;lead_time&quot;: 2.962, &quot;created_at&quot;: &quot;2024-08-01T21:54:48.880563Z&quot;, &quot;updated_at&quot;: &quot;2024-08-01T21:54:48.880602Z&quot;, &quot;updated_by&quot;: 2, &quot;created_ago&quot;: &quot;0\\u00a0minutes&quot;, &quot;last_action&quot;: null, &quot;completed_by&quot;: 2, &quot;ground_truth&quot;: false, &quot;was_cancelled&quot;: false, &quot;last_created_by&quot;: null, &quot;created_username&quot;: &quot; asiegel@pixeltable.com, 2&quot;, &quot;draft_created_at&quot;: null, &quot;parent_annotation&quot;: null, &quot;parent_prediction&quot;: null}]</td>\n",
       "      <td>sports</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_4dd99a025363e526d7f2f8d57fd46a4c2a100e116b6400aa3032a4e2b3c8fb08.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e9a8b8990c2543668e647eb595c43efa5634215cc43508b13fe18f80465b0f46.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               video  \\\n",
       "0  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "1  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "2  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "\n",
       "                                         annotations video_category  \n",
       "0  [{'id': 33, 'task': 99, 'result': [{'id': 'nNQ...         sports  \n",
       "1                                               None           None  \n",
       "2                                               None           None  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v['video_category'] = v.annotations[0].result[0].value.choices[0]\n",
    "v.select(v.video, v.annotations, v.video_category).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e3a868-55d0-4ee3-b480-6f4d0094add6",
   "metadata": {},
   "source": [
    "Another useful operation is the `get_metadata` function, which returns information about the video itself, such as the resolution and codec (independent of Label Studio). Let's add another computed column to hold such metadata."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "9ae666d2-6822-421c-a703-f6280b135d4d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|███████████████████████████████████████████| 3/3 [00:00<00:00, 142.45 cells/s]\n",
      "Added 3 column values with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>video</th>\n",
       "      <th>annotations</th>\n",
       "      <th>video_category</th>\n",
       "      <th>video_metadata</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e5230e98c24095daeaf27d2610f31f523dcd010968b7907d9a6cffb8fb59a5ec.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>[{&quot;id&quot;: 33, &quot;task&quot;: 99, &quot;result&quot;: [{&quot;id&quot;: &quot;nNQBOQuMTJ&quot;, &quot;type&quot;: &quot;choices&quot;, &quot;value&quot;: {&quot;choices&quot;: [&quot;sports&quot;]}, &quot;origin&quot;: &quot;manual&quot;, &quot;to_name&quot;: &quot;video&quot;, &quot;from_name&quot;: &quot;video-category&quot;}], &quot;project&quot;: 102, &quot;import_id&quot;: null, &quot;lead_time&quot;: 2.962, &quot;created_at&quot;: &quot;2024-08-01T21:54:48.880563Z&quot;, &quot;updated_at&quot;: &quot;2024-08-01T21:54:48.880602Z&quot;, &quot;updated_by&quot;: 2, &quot;created_ago&quot;: &quot;0\\u00a0minutes&quot;, &quot;last_action&quot;: null, &quot;completed_by&quot;: 2, &quot;ground_truth&quot;: false, &quot;was_cancelled&quot;: false, &quot;last_created_by&quot;: null, &quot;created_username&quot;: &quot; asiegel@pixeltable.com, 2&quot;, &quot;draft_created_at&quot;: null, &quot;parent_annotation&quot;: null, &quot;parent_prediction&quot;: null}]</td>\n",
       "      <td>sports</td>\n",
       "      <td>{&quot;size&quot;: 815026, &quot;streams&quot;: [{&quot;type&quot;: &quot;video&quot;, &quot;width&quot;: 640, &quot;frames&quot;: 235, &quot;height&quot;: 480, &quot;duration&quot;: 235235, &quot;metadata&quot;: {&quot;encoder&quot;: &quot;AVC Coding&quot;, &quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;MP4 Video Media Handler&quot;, &quot;creation_time&quot;: &quot;2010-04-27T16:40:32.000000Z&quot;}, &quot;base_rate&quot;: 29.97, &quot;time_base&quot;: 3.333e-05, &quot;average_rate&quot;: 29.97, &quot;guessed_rate&quot;: 29.97, &quot;codec_context&quot;: {&quot;name&quot;: &quot;h264&quot;, &quot;pix_fmt&quot;: &quot;yuv420p&quot;, &quot;profile&quot;: &quot;High&quot;, &quot;codec_tag&quot;: &quot;avc1&quot;}, &quot;duration_seconds&quot;: 7.841}, {&quot;type&quot;: &quot;audio&quot;, &quot;frames&quot;: 339, &quot;duration&quot;: 347135, &quot;metadata&quot;: {&quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;MP4 Sound Media Handler&quot;, &quot;creation_time&quot;: &quot;2010-04-27T16:40:32.000000Z&quot;}, &quot;time_base&quot;: 2.268e-05, &quot;codec_context&quot;: {&quot;name&quot;: &quot;aac&quot;, &quot;profile&quot;: &quot;LC&quot;, &quot;channels&quot;: 2, &quot;codec_tag&quot;: &quot;mp4a&quot;}, &quot;duration_seconds&quot;: 7.872}], &quot;bit_rate&quot;: 828326, &quot;metadata&quot;: {&quot;major_brand&quot;: &quot;mp42&quot;, &quot;creation_time&quot;: &quot;2010-04-27T16:40:32.000000Z&quot;, &quot;minor_version&quot;: &quot;0&quot;, &quot;compatible_brands&quot;: &quot;isom&quot;}, &quot;bit_exact&quot;: false}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_4dd99a025363e526d7f2f8d57fd46a4c2a100e116b6400aa3032a4e2b3c8fb08.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>{&quot;size&quot;: 1558736, &quot;streams&quot;: [{&quot;type&quot;: &quot;video&quot;, &quot;width&quot;: 640, &quot;frames&quot;: 450, &quot;height&quot;: 480, &quot;duration&quot;: 450450, &quot;metadata&quot;: {&quot;encoder&quot;: &quot;AVC Coding&quot;, &quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;MP4 Video Media Handler&quot;, &quot;creation_time&quot;: &quot;2009-05-20T00:53:00.000000Z&quot;}, &quot;base_rate&quot;: 29.97, &quot;time_base&quot;: 3.333e-05, &quot;average_rate&quot;: 29.97, &quot;guessed_rate&quot;: 29.97, &quot;codec_context&quot;: {&quot;name&quot;: &quot;h264&quot;, &quot;pix_fmt&quot;: &quot;yuv420p&quot;, &quot;profile&quot;: &quot;High&quot;, &quot;codec_tag&quot;: &quot;avc1&quot;}, &quot;duration_seconds&quot;: 15.015}, {&quot;type&quot;: &quot;audio&quot;, &quot;frames&quot;: 648, &quot;duration&quot;: 663551, &quot;metadata&quot;: {&quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;MP4 Sound Media Handler&quot;, &quot;creation_time&quot;: &quot;2009-05-20T00:53:00.000000Z&quot;}, &quot;time_base&quot;: 2.268e-05, &quot;codec_context&quot;: {&quot;name&quot;: &quot;aac&quot;, &quot;profile&quot;: &quot;LC&quot;, &quot;channels&quot;: 2, &quot;codec_tag&quot;: &quot;mp4a&quot;}, &quot;duration_seconds&quot;: 15.047}], &quot;bit_rate&quot;: 828756, &quot;metadata&quot;: {&quot;major_brand&quot;: &quot;mp42&quot;, &quot;creation_time&quot;: &quot;2009-05-20T00:53:00.000000Z&quot;, &quot;minor_version&quot;: &quot;0&quot;, &quot;compatible_brands&quot;: &quot;isom&quot;}, &quot;bit_exact&quot;: false}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_video\" style=\"width:320px;\">\n",
       "            <video controls width=\"320\" poster=\"\">\n",
       "                <source src=\"http://127.0.0.1:59746/Users/asiegel/.pixeltable/file_cache/a6530bfb8907424a96b91ea6e4447132_0_e9a8b8990c2543668e647eb595c43efa5634215cc43508b13fe18f80465b0f46.mp4\" type=\"video/mp4\" />\n",
       "            </video>\n",
       "        </div></td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>{&quot;size&quot;: 2099014, &quot;streams&quot;: [{&quot;type&quot;: &quot;video&quot;, &quot;width&quot;: 640, &quot;frames&quot;: 600, &quot;height&quot;: 360, &quot;duration&quot;: 600600, &quot;metadata&quot;: {&quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;VideoHandler&quot;}, &quot;base_rate&quot;: 29.97, &quot;time_base&quot;: 3.333e-05, &quot;average_rate&quot;: 29.97, &quot;guessed_rate&quot;: 29.97, &quot;codec_context&quot;: {&quot;name&quot;: &quot;h264&quot;, &quot;pix_fmt&quot;: &quot;yuv420p&quot;, &quot;profile&quot;: &quot;High&quot;, &quot;codec_tag&quot;: &quot;avc1&quot;}, &quot;duration_seconds&quot;: 20.02}, {&quot;type&quot;: &quot;audio&quot;, &quot;frames&quot;: 863, &quot;duration&quot;: 883712, &quot;metadata&quot;: {&quot;language&quot;: &quot;eng&quot;, &quot;vendor_id&quot;: &quot;[0][0][0][0]&quot;, &quot;handler_name&quot;: &quot;SoundHandler&quot;}, &quot;time_base&quot;: 2.268e-05, &quot;codec_context&quot;: {&quot;name&quot;: &quot;aac&quot;, &quot;profile&quot;: &quot;LC&quot;, &quot;channels&quot;: 2, &quot;codec_tag&quot;: &quot;mp4a&quot;}, &quot;duration_seconds&quot;: 20.039}], &quot;bit_rate&quot;: 836844, &quot;metadata&quot;: {&quot;encoder&quot;: &quot;Lavf54.63.104&quot;, &quot;major_brand&quot;: &quot;isom&quot;, &quot;minor_version&quot;: &quot;512&quot;, &quot;compatible_brands&quot;: &quot;isomiso2avc1mp41&quot;}, &quot;bit_exact&quot;: false}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               video  \\\n",
       "0  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "1  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "2  /Users/asiegel/.pixeltable/file_cache/a6530bfb...   \n",
       "\n",
       "                                         annotations video_category  \\\n",
       "0  [{'id': 33, 'task': 99, 'result': [{'id': 'nNQ...         sports   \n",
       "1                                               None           None   \n",
       "2                                               None           None   \n",
       "\n",
       "                                      video_metadata  \n",
       "0  {'size': 815026, 'streams': [{'type': 'video',...  \n",
       "1  {'size': 1558736, 'streams': [{'type': 'video'...  \n",
       "2  {'size': 2099014, 'streams': [{'type': 'video'...  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pixeltable.functions.video import get_metadata\n",
    "\n",
    "v['video_metadata'] = get_metadata(v.video)\n",
    "v.select(v.video, v.annotations, v.video_category, v.video_metadata).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c8c5394-2539-4160-84df-68cb8df72de9",
   "metadata": {},
   "source": [
    "## Preannotations with Pixeltable and Label Studio"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca674ef6-46f0-4d65-bcd8-58399bdacef2",
   "metadata": {},
   "source": [
    "Frame extraction is another common operation in labeling workflows. In this example, we'll extract frames from our videos into a view, then use an object detection model to generate preannotations for each frame. The following code uses a Pixeltable `FrameIterator` to automatically extract frames into a new view, which we'll call `frames_2024_04_22`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "64ad60b8-fca5-419d-81bb-8c9e640b0037",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `frames_2024_04_22`: 13 rows [00:00, 5265.66 rows/s]\n",
      "Created view `frames_2024_04_22` with 13 rows, 0 exceptions.\n"
     ]
    }
   ],
   "source": [
    "from datetime import date\n",
    "from pixeltable.iterators import FrameIterator\n",
    "\n",
    "today = date(2024, 4, 22)\n",
    "videos_table = pxt.get_table('ls_demo.videos')\n",
    "\n",
    "# Create the view, using a `FrameIterator` to extract frames with a sample rate\n",
    "# of `fps=0.25`, or 1 frame per 4 seconds of video. Setting `fps=0` would use the\n",
    "# native framerate of the video, extracting every frame.\n",
    "\n",
    "frames = pxt.create_view(\n",
    "    'ls_demo.frames_2024_04_22',\n",
    "    videos_table.where(videos_table.date == today),\n",
    "    iterator=FrameIterator.create(video=videos_table.video, fps=0.25)\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "f923f511-7624-4537-9eaf-21a8337696fc",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>frame</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               frame\n",
       "0  <PIL.Image.Image image mode=RGB size=640x480 a...\n",
       "1  <PIL.Image.Image image mode=RGB size=640x480 a...\n",
       "2  <PIL.Image.Image image mode=RGB size=640x480 a..."
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Show just the first 3 frames in the table, to avoid cluttering the notebook\n",
    "\n",
    "frames.select(frames.frame).head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7865f479-fce9-40a5-a933-330d2cfc2b77",
   "metadata": {},
   "source": [
    "Now we'll use the Resnet-50 object detection model to generate preannotations. We do this by creating a new computed column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "72b0de58-54f4-412c-bd0a-4964d0bab6ce",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|██████████████████████████████████████████| 13/13 [00:04<00:00,  2.71 cells/s]\n",
      "Added 13 column values with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>frame</th>\n",
       "      <th>detections</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[584.916, 0.75, 639.989, 321.319], [46.766, 93.549, 294.693, 465.584]], &quot;labels&quot;: [1, 1], &quot;scores&quot;: [0.995, 0.999], &quot;label_text&quot;: [&quot;person&quot;, &quot;person&quot;]}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[562.879, 197., 640.206, 229.308], [415.185, 149.31, 427.469, 168.121], [182.444, 160.578, 219.743, 248.855], [413.599, 139.345, 426.33, 160.366]], &quot;labels&quot;: [15, 4, 1, 1], &quot;scores&quot;: [0.981, 0.995, 1., 0.988], &quot;label_text&quot;: [&quot;bench&quot;, &quot;motorcycle&quot;, &quot;person&quot;, &quot;person&quot;]}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[387.795, 196.154, 505.848, 372.086]], &quot;labels&quot;: [1], &quot;scores&quot;: [1.], &quot;label_text&quot;: [&quot;person&quot;]}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               frame  \\\n",
       "0  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "1  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "2  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "\n",
       "                                          detections  \n",
       "0  {'boxes': [[584.916259765625, 0.74955940246582...  \n",
       "1  {'boxes': [[562.8792724609375, 196.99993896484...  \n",
       "2  {'boxes': [[387.7949523925781, 196.15356445312...  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pixeltable.functions.huggingface import detr_for_object_detection\n",
    "\n",
    "# Run the Resnet-50 object detection model against each frame to generate bounding boxes\n",
    "frames['detections'] = detr_for_object_detection(frames.frame, model_id='facebook/detr-resnet-50', threshold=0.95)\n",
    "frames.select(frames.frame, frames.detections).head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3929288b-b68a-40a2-9246-412b11adadb7",
   "metadata": {},
   "source": [
    "We'd like to send these detections to Label Studio as preannotations, but they're not quite ready. Label Studio expects preannotations in standard COCO format, but the Huggingface library outputs them in its own custom format. We can use Pixeltable's handy `detr_to_coco` function to do the conversion, using another computed column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "82cbcc23-1f43-46e4-aa15-85b9148ae946",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing cells: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 40.50 cells/s]\n",
      "Added 13 column values with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>frame</th>\n",
       "      <th>detections</th>\n",
       "      <th>preannotations</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[584.916, 0.75, 639.989, 321.319], [46.766, 93.549, 294.693, 465.584]], &quot;labels&quot;: [1, 1], &quot;scores&quot;: [0.995, 0.999], &quot;label_text&quot;: [&quot;person&quot;, &quot;person&quot;]}</td>\n",
       "      <td>{&quot;image&quot;: {&quot;width&quot;: 640, &quot;height&quot;: 480}, &quot;annotations&quot;: [{&quot;bbox&quot;: [584.916, 0.75, 55.073, 320.57], &quot;category&quot;: 1}, {&quot;bbox&quot;: [46.766, 93.549, 247.927, 372.035], &quot;category&quot;: 1}]}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[562.879, 197., 640.206, 229.308], [415.185, 149.31, 427.469, 168.121], [182.444, 160.578, 219.743, 248.855], [413.599, 139.345, 426.33, 160.366]], &quot;labels&quot;: [15, 4, 1, 1], &quot;scores&quot;: [0.981, 0.995, 1., 0.988], &quot;label_text&quot;: [&quot;bench&quot;, &quot;motorcycle&quot;, &quot;person&quot;, &quot;person&quot;]}</td>\n",
       "      <td>{&quot;image&quot;: {&quot;width&quot;: 640, &quot;height&quot;: 480}, &quot;annotations&quot;: [{&quot;bbox&quot;: [562.879, 197., 77.326, 32.308], &quot;category&quot;: 15}, {&quot;bbox&quot;: [415.185, 149.31, 12.283, 18.811], &quot;category&quot;: 4}, {&quot;bbox&quot;: [182.444, 160.578, 37.299, 88.276], &quot;category&quot;: 1}, {&quot;bbox&quot;: [413.599, 139.345, 12.73, 21.021], &quot;category&quot;: 1}]}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td><div class=\"pxt_image\" style=\"width:240px;\">\n",
       "                <img src=\"\" width=\"240\" />\n",
       "            </div></td>\n",
       "      <td>{&quot;boxes&quot;: [[387.795, 196.154, 505.848, 372.086]], &quot;labels&quot;: [1], &quot;scores&quot;: [1.], &quot;label_text&quot;: [&quot;person&quot;]}</td>\n",
       "      <td>{&quot;image&quot;: {&quot;width&quot;: 640, &quot;height&quot;: 480}, &quot;annotations&quot;: [{&quot;bbox&quot;: [387.795, 196.154, 118.053, 175.932], &quot;category&quot;: 1}]}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "                                               frame  \\\n",
       "0  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "1  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "2  <PIL.Image.Image image mode=RGB size=640x480 a...   \n",
       "\n",
       "                                          detections  \\\n",
       "0  {'boxes': [[584.916259765625, 0.74955940246582...   \n",
       "1  {'boxes': [[562.8792724609375, 196.99993896484...   \n",
       "2  {'boxes': [[387.7949523925781, 196.15356445312...   \n",
       "\n",
       "                                      preannotations  \n",
       "0  {'image': {'width': 640, 'height': 480}, 'anno...  \n",
       "1  {'image': {'width': 640, 'height': 480}, 'anno...  \n",
       "2  {'image': {'width': 640, 'height': 480}, 'anno...  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pixeltable.functions.huggingface import detr_to_coco\n",
    "\n",
    "frames['preannotations'] = detr_to_coco(frames.frame, frames.detections)\n",
    "frames.select(frames.frame, frames.detections, frames.preannotations).head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2eef3866-7a14-4d17-9abc-71e1158f5020",
   "metadata": {},
   "source": [
    "## Create a Label Studio Project for Frames"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bd4b9c6-4da9-4112-8b48-edc8e39674f6",
   "metadata": {},
   "source": [
    "With our data workflow set up and the COCO preannotations prepared, all that's left is to create a corresponding Label Studio project. Note how Pixeltable automatically maps `RectangleLabels` preannotation fields to columns, just like it does with data fields. Here, Pixeltable interprets the `name=\"preannotations\"` attribute in `RectangleLabels` to mean, \"map these rectangle labels to the `preannotations` column in my linked table or view\".\n",
    "\n",
    "The Label values `car`, `person`, and `train` are standard COCO object identifiers used by many off-the-shelf object detection models. You can find the complete list of them here, and include as many as you wish: https://raw.githubusercontent.com/pixeltable/pixeltable/master/docs/release/coco-categories.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "bd39e113-c894-4c3b-9401-8388cbbf1a8b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Added 13 column values with 0 errors.\n",
      "Computing cells: 100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 41.27 cells/s]\n",
      "Linked external store `ls_project_0` to table `frames_2024_04_22`.\n",
      "Created 13 new task(s) in LabelStudioProject `frames_2024_04_22`.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "SyncStatus(external_rows_created=13, external_rows_deleted=0, external_rows_updated=0, pxt_rows_updated=0, num_excs=0)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frames_config = '''\n",
    "    <View>\n",
    "      <Image name=\"frame\" value=\"$frame\"/>\n",
    "      <RectangleLabels name=\"preannotations\" toName=\"frame\">\n",
    "        <Label value=\"car\" background=\"blue\"/>\n",
    "        <Label value=\"person\" background=\"red\"/>\n",
    "        <Label value=\"train\" background=\"green\"/>\n",
    "      </RectangleLabels>\n",
    "    </View>\n",
    "    '''\n",
    "\n",
    "pxt.io.create_label_studio_project(frames, frames_config)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e48446f9-39af-4017-9ab9-72a8951131b5",
   "metadata": {},
   "source": [
    "If you go into Label Studio and open up the new project, you can see the effect of adding the preannotations from Resnet-50 to our workflow."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f16ee59-4200-4a98-9965-eaa72727d103",
   "metadata": {},
   "source": [
    "## Incremental Updates\n",
    "\n",
    "As we saw in the [Pixeltable Basics](https://pixeltable.readme.io/docs/pixeltable-basics) tutorial, adding new data to Pixeltable results in incremental updates of everything downstream. We can see this by inserting a new video into our base videos table: all of the downstream views and computed columns are updated automatically, including the video metadata, frames, and preannotations.\n",
    "\n",
    "The update may take some time, so please be patient (it involves a sequence of operations, including frame extraction and object detection)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "dd371a24-795a-4c3e-9d92-e1310013d915",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Inserting rows into `videos`: 1 rows [00:00, 597.73 rows/s]\n",
      "Inserting rows into `videos_2024_04_22`: 1 rows [00:00, 991.33 rows/s]\n",
      "Inserting rows into `frames_2024_04_22`: 5 rows [00:00, 2816.86 rows/s]\n",
      "Inserted 7 rows with 0 errors.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "UpdateStatus(num_rows=7, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "videos_table.insert(\n",
    "    video=url_prefix + '22a/948/22a9487a92956ac453a9c15e0fc4dd4.mp4',\n",
    "    date=today\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1da98d43-fe42-4d91-ba3d-a5c4b32170cb",
   "metadata": {},
   "source": [
    "Note that the incremental updates do _not_ automatically sync the `Table` with the remote Label Studio projects. To issue a sync, we have to call the `sync()` methods separately. Note that tasks will be created only for the _newly added_ rows in the videos and frames views, not the existing ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9799e354-3fcf-4755-8192-135e94a6a8ef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Created 1 new task(s) in LabelStudioProject `videos_2024_04_22`.\n",
      "Created 5 new task(s) in LabelStudioProject `frames_2024_04_22`.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "SyncStatus(external_rows_created=5, external_rows_deleted=0, external_rows_updated=0, pxt_rows_updated=0, num_excs=0)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v.sync()\n",
    "frames.sync()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb7f04bb-d2ba-4cbc-86c0-a3995f2118a5",
   "metadata": {},
   "source": [
    "## Deleting a Project\n",
    "\n",
    "To remove a Label Studio project from a table or view, use `unlink_external_stores()`, as demonstrated by the following example. If you specify `delete_external_data=True`, then the Label Studio project will also be deleted, along with all existing data and annotations (be careful!) If `delete_external_data=False`, then the Label Studio project will be unlinked from Pixeltable, but the project and data will remain in Label Studio (so you'll need to delete the project manually if you later want to get rid of it)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "d5a28b50-e7ac-4302-b675-1394c57a9843",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ls_project_0']"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "v.external_stores  # Get a list of all external stores for `v`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "b0055b97-5912-4073-b6be-af9d894f5f1c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Deleted Label Studio project: videos_2024_04_22\n",
      "Unlinked external store from table `videos_2024_04_22`: ls_project_0\n"
     ]
    }
   ],
   "source": [
    "v.unlink_external_stores('ls_project_0', delete_external_data=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20569f26-85ad-45e5-8746-ebf376d77519",
   "metadata": {},
   "source": [
    "## Configuring `media_import_method`\n",
    "\n",
    "All of the examples so far in this tutorial use HTTP file uploads to send media data to Label Studio. This is the simplest method and the easiest to configure, but it's undesirable for complex projects or projects with a lot of data. In fact, the Label Studio documentation includes this specific warning: \"Uploading data works fine for proof of concept projects, but it is not recommended for larger projects.\"\n",
    "\n",
    "In Pixeltable, you can configure linked Label Studio projects to use URLs for media data (instead of file uploads) by specifying the `media_import_method='url'` argument in `create_label_studio_project`. This is recommended for all production applications, and is mandatory for projects whose input configuration is more complex than a single media file (in the Label Studio parlance, projects with more than one \"data key\").\n",
    "\n",
    "If `media_import_method='url'`, then Pixeltable will simply pass the media data URLs directly to Label Studio. If the URLs are `http://` or `https://` URLs, then nothing more needs to be done.\n",
    "\n",
    "Label Studio also supports `s3://` URLs with credentialed access. To use them, you'll need to configure access to your bucket in the project configuration. The simplest way to do this is by specifying an `s3_configuration` in `create_label_studio_project`. Here's an example, though it won't work directly in this demo notebook, since it relies on having an access key. (If your AWS credentials are stored in `~/.aws/credentials`, then you can omit the access key and secret, and Pixeltable will fill them in automatically.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d418d367-c188-443e-b911-9325fc37d841",
   "metadata": {},
   "outputs": [],
   "source": [
    "pxt.io.create_label_studio_project(\n",
    "    v,\n",
    "    label_config,\n",
    "    media_import_method='url',\n",
    "    s3_configuration={'bucket': 'pxt-test', 'aws_access_key_id': my_key, 'aws_secret_access_key': my_secret}\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7df6cb2-d8a4-4141-91a7-ad97e75dabee",
   "metadata": {},
   "source": [
    "Before you can set up credentialed S3 access, you'll need to configure your S3 bucket to work with Label Studio; the details on how to do this are described here:\n",
    "- [Label Studio Docs: Amazon S3](https://labelstud.io/guide/storage.html#Amazon-S3)\n",
    "\n",
    "For the full documentation on `create_label_studio_project` usage, see:\n",
    "- [Pixeltable API Docs: create_label_studio_project](https://pixeltable.github.io/pixeltable/api/io/#pixeltable.io.create_label_studio_project)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de46c9d9-edb6-43eb-be6e-d4e857477622",
   "metadata": {},
   "source": [
    "## Notebook Cleanup\n",
    "\n",
    "That's the end of the tutorial! To conclude, let's terminate the running Label Studio process. (Of course, feel free to leave it running if you want to play around with it some more.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "d44f708a-207b-41b6-bbdb-1026809e794d",
   "metadata": {},
   "outputs": [],
   "source": [
    "ls_process.kill()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
